RaPiDS: an algorithm for rapid expression profile database search.
نویسندگان
چکیده
In this paper we present a fast algorithm and implementation for computing the Spearman rank correlation (SRC) between a query expression profile and each expression profile in a database of profiles. The algorithm is linear in the size of the profile database with a very small constant factor. It is designed to efficiently handle multiple profile platforms and missing values. We show that our specialized algorithm and C++ implementation can achieve an approximately 100-fold speed-up over a reasonable baseline implementation using Perl hash tables. RaPiDS is designed for general similarity search rather than classification - but in order to attempt to classify the usefulness of SRC as a similarity measure we investigate the usefulness of this program as a classifier for classifying normal human cell types based on gene expression. Specifically we use the k nearest neighbor classifier with a t statistic derived from SRC as the similarity measure for profile pairs. We estimate the accuracy using a jackknife test on the microarray data with manually checked cell type annotation. Preliminary results suggest the measure is useful (64% accuracy on 1,685 profiles vs. the majority class classifier's 17.5%) for profiles measured under similar conditions (same laboratory and chip platform); but requires improvement when comparing profiles from different experimental series.
منابع مشابه
CellMontage: Similar Expression Profile Search Server
The establishment and rapid expansion of microarray databases has created a need for new search tools. Here we present CellMontage, the first server for expression profile similarity search over a large database-69 000 microarray experiments derived from NCBI's; GEO site. CellMontage provides a novel, content-based search engine for accessing gene expression data. Microarray experiments with si...
متن کاملAn Effective Path-aware Approach for Keyword Search over Data Graphs
Abstract—Keyword Search is known as a user-friendly alternative for structured languages to retrieve information from graph-structured data. Efficient retrieving of relevant answers to a keyword query and effective ranking of these answers according to their relevance are two main challenges in the keyword search over graph-structured data. In this paper, a novel scoring function is proposed, w...
متن کاملAN OPTIMIZED NEURO-FUZZY GROUP METHOD OF DATA HANDLING SYSTEM BASED ON GRAVITATIONAL SEARCH ALGORITHM FOR EVALUATION OF LATERAL GROUND DISPLACEMENTS
During an earthquake, significant damage can result due to instability of the soil in the area affected by internal seismic waves. A liquefaction-induced lateral ground displacement has been a very damaging type of ground failure during past strong earthquakes. In this study, neuro-fuzzy group method of data handling (NF-GMDH) is utilized for assessment of lateral displacement in both ground sl...
متن کاملParameters Assignment of Electric Train Controller by Using Gravitational Search Optimization Algorithm
The speed profile of the train will be determined according to criteria such as safety, travel convenience, and the type of electric motor used for traction. Due to the passengers and cargo on the train, the electric train load is constantly changing. This will require reassigning the speed controller’s parameters of the electric train. For this purpose, the Gravitational Search optimization Al...
متن کاملA graph search algorithm: Optimal placement of passive harmonic filters in a power system
The harmonic in distribution systems becomes an important problem due to an increase in nonlinear loads. This paper presents a new approach based on a graph algorithm for optimum placement of passive harmonic filters in a multi-bus system, which suffers from harmonic current sources. The objective of this paper is to minimize the network loss, the cost of the filter and the total harmonic disto...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Genome informatics. International Conference on Genome Informatics
دوره 17 2 شماره
صفحات -
تاریخ انتشار 2006